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Abstract. Audio signal processing frequently requires time-frequency repre- 
sentations and in many applications, a non-linear spacing of frequency-bands 
is preferable. This paper introduces a framework for efficient implementa- 
tion of invertible signal transforms allowing for non-uniform and in particular 
non-linear frequency resolution. Non-uniformity in frequency is realized by ap- 
plying nonstationary Gabor frames with adaptivity in the frequency domain. 
The realization of a perfectly invertible constant- Q transform is described in 
detail. To achieve real-time processing, independent of signal length, slice- 
wise processing of the full input signal is proposed and referred to as sliCQ 
transform. 

By applying frame theory and FFT-based processing, the presented ap- 
proach overcomes computational inefficiency and lack of invertibility of clas- 
sical constant-Q transform implementations. Numerical simulations evaluate 
the efficiency of the proposed algorithm and the method's applicability is il- 
lustrated by experiments on real-life audio signals. 



1. Introduction 

Analysis, synthesis and processing of sound is commonly based on the repre- 
sentation of audio signals by means of time-frequency dictionaries. The short-time 
Fourier transform (STFT), also referred to as Gabor transform, is a widely used tool 
due to its straight-forward interpretation and FFT-based implementation, which en- 
sure efficiency and invertibility [111 [7]. STFT features a uniform time and frequency 
resolution and a linear spacing of the time frequency bins. 

In contrast, the constant-Q transform (CQT), originally introduced in p2j and in 
music processing by J. Brown [5], provides a frequency resolution that depends on 
geometrically spaced center frequencies of the analysis windows. In particular, the 
Q-factor, i.e. the ratio of center frequency to bandwidth of each window, is constant 
over all frequency bins; the constant Q-factor leads to a finer frequency resolution 
in low frequencies whereas time resolution improves with increasing frequency. This 
principle makes the constant-Q transform well-suited for audio data, since it better 
reflects the resolution of the human auditory system than the linear frequency- 
spacing provided by the FFT, cf. [20' and references therein. Furthermore, musical 
characteristics such as overtone structures remain invariant under frequency shifts 
in a constant-Q transform, which is a natural feature from a perception point of 
view. In speech and music processing, perception-based considerations are impor- 
tant, which is one of the reasons why CQTs, due to their previously discussed 
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properties, are often desirable in these fields. An example of a CQ-transform, ob- 
tained with our algorithm, is shown in Figure [T] 

The principal idea of CQT is reminiscent of wavelet transforms, compare [19j . 
As opposed to wavelet transforms, the original CQT is not invertible and does not 
rely on any concept of (orthonormal) bases. On the other hand, the number of 
bins (frequency channels) per octave is much higher in the CQT than most tra- 
ditional wavelet techniques would allow for. Partly due to this requirement, the 
computational efficiency of the original transform as well as its improved versions, 
cf. [3], may often be insufficient. Moreover, the lack of invertibility of existing CQTs 
has become an important issue: for some desired applications, such as extraction 
and modification, e.g. transposition, of distinct parts of the signal, the unbiased 
reconstruction from analysis coefficients is crucial. Approximate methods for recon- 
struction from constant-Q coefficients have been proposed before, in particular for 
signals which are sparse in the frequency domain [5] and by octave-wise processing 
in [ig. 

In the present contribution, we are interested in inversion in the sense of perfect 
reconstruction, i.e. up to numerical precision; to this end, we investigate a new ap- 
proach to constant-Q signal processing. The presented framework has the following 
core properties: 

(1) Relying on concepts from frame theory, |15j . we suggest the implementa- 
tion of a constant-Q transform using the nonstationary Gabor transform 
(NSGT), which guarantees perfect invertibility. This perfectly invertible 
constant-Q transform is subsequently called constant- Q nonstationary Ga- 
bor transform (CQ-NSGT). 

(2) We introduce a preprocessing step by slicing the signal to pieces of (usually 
uniform) finite length. Together with FFT-based methods, this allows for 
bounded delay and results in linear processing time. Thus, our algorithm 
lends itself to real-time processing and the resulting transform is referred 
to as sliced constant-Q transform (sliCQ). 

NSGTs, introduced in [ITJ[l], generalize the classical sampled short-time Fourier 
transform or Gabor transform [151 [10]. They allow for fast, FFT-based implemen- 
tation of both analysis and reconstruction under mild conditions on the analysis 
windows. The CQ-NSGT was first presented in [5T]; the frequency-resolution of 
the proposed CQ-NSGT is essentially identical to that of the CQT, cf. Figure[l]for 
an example. 

The main drawback of the CQ-NSGT is the inherent necessity to obtain a Fourier 
transform of the entire signal prior to actual processing. This problem prohibits 
real-time implementation and is overcome by a slicing step, which preserves the 
perfect reconstruction property. However, blocking effects and time-aliasing may 
be observed if the coefficients are modified in applications such as de-noising or 
transposition and time-shift of certain signal components. While slicing the signal 
naturally introduces a trade-off between delay and finest possible frequency resolu- 
tion, the parameters can be chosen to suppress blocking artifacts and to leave the 
constant-Q coefficient structure intact. 

The rest of this paper is organized as follows. In Section [2] we introduce the 
concepts of frames as overcomplete, stable spanning sets, with a focus on nonsta- 
tionary Gabor (NSG) systems and their properties. We recall the conditions for 
these systems to constitute so-called painless frames, a special case that allows for 
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Figure 1. Time- frequency representations on a logarithmically 
scaled frequency axis: STFT spectrogram (top) and constant-Q 
NSGT spectrogram (bottom). 

straightforward inversion. Section |3] describes the construction of the CQ-NSGT by 
NSG frames with adaptivity in the frequency domain. This is the starting point for 
the sliCQ transform, which is explored in Section |4l After giving the general idea, 
we describe interpretation of the sliCQ-coefScients in relation to the full-length 
transform in Section 14.31 Subsequently, Section [5] is concerned with an analysis of 
the transforms' numerical properties, in particular computation time and complex- 
ity, as well as the quality of approximation of the CQ-NSGT coefficients by the 
sliCQ, accompanied by a set of simulations. Finally, in Section |6] the CQ-NSGT is 
applied and evaluated in the analysis and processing of real-life signals. The paper 
is closed by a short summary and conclusion. 

2. NONSTATIONARY GABOR FRAMES 

Frames, first mentioned in [5], also cf. [HHl], generalize (orthonormal) bases and 
allow for redundancy and thus design flexibility in signal representations. Frames 
may be tailored to a specific application or certain requirements such as a constant- 
Q frequency resolution. Loosely speaking, we wish to represent a given signal of 
interest as a sum of the frame members (pn,k^ weighted by coefficients Cn,k- 

(1) / = 2 C„,feV5„,fc. 

n,k 
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The double indexes (n, k) allude to the fact that each atom has a certain location 
and concentration in time and frequency. Frame theory establishes conditions under 
which an expansion of the form ([1} can be obtained with coefficients leading to 
stable, perfect reconstruction. 

For this contribution, we only consider frames for C^, that is vector spaces of 
finite, discrete signals, understood as functions f,g on C^. We denote by {f,g} the 
inner product of / and g, i.e. </, g) = fU^dU] and ||/||2 = \^(f,f)- The struc- 

tures introduced here can easily be extended to the Hilbert space of quadratically 
integrable functions, i^(R). 

2.1. Frames. Consider a collection of atoms (pn,k e with {n,k) e In x Ij^ for 
finite index sets In, Ik- We then define the frame operator S by 

(2) S/ = 2</,<^„^fe>Vn,fe, 

for all / e C^. If the linear operator S is invertible on C^, then the set of functions 
Wn.k} {n,k)EiN X Ik ^ ^ frameij. In this case, we may define a dual frame by 

(3) >f^k = S~Vn,fc 

and reconstruction from the coefficients Cn,k = (/, '^n,k) is straight-forward: 

n ,k n, k 

= SS~V = ^</, Vri,fc></'n,fc = 'Y(f,'f^k}Vn,k- 
n,k n,k 

We next introduce a case of particular importance, the so-called Gabor frames, 
for which the elements (pn,k are obtained from a single window ip by time- and 
frequency-shifts along a lattice. Let T^, and M^^ denote a time-shift by x and a 
frequency shift (or modulation) by cj, i.e. 

T,f[l] = f[l-x] and M^/[/] = e2-'-/^/[z], 
where I ^ x is considered modulo L. Furthermore, we use the normalization 

-^/[j] = /[j] = 4f e'/m^"'"''^'^ 

for the discrete Fourier transform of /. It follows that T{Txf) = M-xf and 

Fixing a time-shift parameter a and a frequency-shift parameter 6, with L/a, L/b e 
N, we call the collection of atoms Q = {ipn,k = ^kbTna^}{n,k)EiNxiK' '^ith In x 
Ik = '^L/a X '^L/b, a Gabor system. If is a frame, it is called a Gabor frame. For 
Gabor frames, the frame coefficients are given by samples of the short-time Fourier 
transform (STFT) of / with respect to the window ip: 

Cn.k = </, Vn,k) = </, MkbTnaV) 
L-1 

(4) = 2 f[lMl-na]e-'-'' '^''/''. 

1=0 



Note that, if j,, (n, k) e /jv ^ ^k} is an orthonormal basis, then S is the identity operator. 
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In a general setting, the inversion of the operator S poses a problem in numerical 
realization of frame analysis. However, for Gabor frames, it was shown in [6:, that 
under certain conditions, usually fulfilled in practical applications, S is diagonal, 
and a dual frame can be calculated easily. This situation of painless non-orthogonal 
expansions can now be generalized to allow for adaptive resolution. 

2.2. Frequency- Adaptive Painless Nonstationary Gabor Frames. In clas- 
sical Gabor frames, we obtain all samples of the STFT in (jlj by applying the same 
window If, shifted along a regular set of sampling points and taking an FFT of 
the same length. In order to achieve adaptivity of the resolution in either time or 
frequency, we relax the regularity of classical Gabor frames to derive nonstationary 
Gabor frames. 

The original motivation for the introduction of NSGT was the desire to adapt 
both window size and sampling density in time, cf. [Ill |T], in order to accurately 
resolve transient signal components. Here, we apply the same idea in frequency, i.e. 
adapt both the bandwidth and sampling density in frequency. From an algorithmic 
point of view, we apply a nonstationary Gabor system to the Fourier transform of 
the input signal. 

The windows are constructed directly in the frequency domain by taking real- 
valued filters gk centered at uik- The inverse Fourier transforms gk ■= J^~^gk 
are the time-reverse impulse responses of the corresponding (frequency-adaptive) 
filters. Therefore, we let gk, k e Ik, denote the members of a finite collection of 
band-limited windows, well-localized in time, whose Fourier transforms gk = J-gk 
are centered around possibly irregularly (or, e.g. geometrically) spaced frequency 
points LUk- 

Then, we select frequency dependent time-shift parameters (hop-sizes) as 
follows: if the support (the interval where the vector is nonzero) of gk is contained 
in an interval of length Lj,, then aj, is chosen such that 

(5) Ok -J— for aU k. 

Lk 

In other words, the time-sampling points have to be chosen dense enough to guar- 
antee ([5]). If we denote by gn,k the modulation of gk by —nok, i.e. gn,k = M_„a^gfe, 
then we obtain the frame members ipn.k by setting 

<fn,k = (hhk = -^^"^(M_„afcfffe) = T^nak9k, 

where k e Ik and n = 0, . . . , L/a^ - 1. The system Q{g, a) := {g„^k = Tnak9k}n,k is 
a painless nonstationary Gabor system, as described in pQ, for C^. We also define 
g := {gk G C^}k€iK ^-iid a := {ak}kGiK- By Parseval's formula, we see that the 
frame coefficients can be written as 

(6) Cn,k = </, !h^k) = </, M_„afc5fc>- 

For convenience, we use the notation c := {ck}keiK •= {{^n,k}n=o ^}keiK to refer 
to the full set of coefficients and channel coefficients, respectively. By abuse of 
notation, we indicate by c G ^L/akxiiKl 1)^1 q is an irregular array with \Ik\ 
columns, the fc-th column possessing Ljak entries. The NSG coefficients can be 
computed using the following algorithm. 

Here (I)FrT^ denotes a (inverse) Fast Fourier transform of length N , includ- 
ing the necessary periodization or zero-padding preprocessing to convert the input 
vector to the correct length N . The analysis algorithm above is complemented by 
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Algorithm 1 NSG analysis: c = CQ-NSGTi(/, g, a) 
1: Initialize /, gt for all k e Ik 

2: / ^ FFTl(/) 

3: for k e Ik, n = 0, . . . , L/ak — 1 do 
5: end for 



Algorithm [21 an equally simple synthesis algorithm that synthesizes a signal / from 
a set of coefficients c. 



Algorithm 2 NSG synthesis: / = iCQ-NSGTi(c, g, a) 

1: Initialize Cn.k,9k for all n = 0, . . . , L/uk — I, k e Ik 
2: for k e Ik do 

3: fk^y^^-FFTL/a,{ck) 

4: end for 

6: /^IFFTi(/) 



If Q (g, a) and ^(g, a) are a pair of dual frames, then we can reconstruct a function 
perfectly from its NSG analysis coefficients. For more details and a proof of the 
following propositions, see Appendix 18. II 

Proposition 1. Ietg{g,a) = {gn,k = Tna^dkjn.k andg{g,a.) = {g^k = Tnak9k}n,k 
be a pair of dual frames. If c is the output o/ CQ-NSGT^(/, g, a) (Algorithm]^, 
then the output f o/ iCQ-NSGT^(c, g, a) (Algorithm\Bj) equals f, i.e. 

(7) / = /, forallfeC'^. 

The remaining problem is to ascertain that t/(g, a) is a frame and to compute the 
dual frame. The following proposition is a discrete version of an equivalent result 
for NSG systems in L^(]R) and achieves both, using the painless case condition ([5|). 

Proposition 2. let C7(g,a) an NSG system satisfying This system is a frame 
if and only if 

(8) o< 2 — |fffe[j]p<c», forallj = 0,...,L-l 
and the generators of the canonical dual frame f/(g,a) are given by 

(9) 9k[j\ = ^ L| r-ii2 - 

In the next section, we construct a constant-Q NSG system satisfying ([5]) and 
®- 

Remark 1. Note that NSG frames can be equivalently used to design general 
nonuniform filter banks |14i 116) in a similar manner. 
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Table 1. Center frequency and bandwidth values 
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3. The CQ-NSGT Parameters: Windows and Lattices 

The parameters of the NSGT can be designed as to implement various frequency- 
adaptive transforms. Here, we focus on the parameters leading to an NSGT with 
constant-Q frequency resolution, suitable for the analysis and processing of music 
signals, as discussed in the introduction. In constant-Q analysis, the functions gk 
are considered to be filters with support of length ^ L centered at frequency 
ujk (in samples), such that for the bins corresponding to a certain frequency range, 
the respective center frequencies and lengths have (approximately) the same ratio. 
Using these filters, the CQ-NSGT coefficients Cn.k are obtained via Algorithm [l] 
where k indexes the frequency bins, and n = 0,...,i/afc — 1. 

As detailed in [21], the construction of the filters for the CQ-NSGT depends on 
the following parameters: minimum and maximum frequencies ^min and ^max (in 
Hz), respectively, the sampling rate ^s, and the number of bins per octave B. The 
center frequencies 6 satisfy 6 = '?min2^, similar to the classical CQT in [5], for 
k = l,...,Jsr, where K is an integer such that ^max ^ £,k < 6/2, the Nyquist 
frequency. Note that the correspondence between and Uk is the conversion ratio 
from Hz to samples, as detailed in the next paragraphs. 

The bandwidths are set to be Q,k = 6-1-1 ~6-i; for fc = 2, . . . , ii' — 1, which lead 
to a constant Q-factor Q = S,k/^k = (2"S' — 2^"b)^i, while fti and fix are taken 
to be 6/Q and S,k/Q, respectively. Since the signals are real-valued, additional 
filters are considered which are positioned in a symmetric manner with respect to 
the Nyquist frequency. Moreover, to ensure that the union of filter supports cover 
the entire frequency axis, filters with center frequencies corresponding to the zero 
frequency and the Nyquist frequency are included. The values for and flk over 
all frequency bins are summarized in Table [TJ 

With these center frequencies and bandwidths, the filters gk are set to be gk[j] = 
H{{j£,s/L—S,k)/^k), ioT k = 1, . . . ,K, K+2, . . . , 2K+1, where H is some continuous 
function centered at 0, positive inside and zero outside of ] — 1/2, l/2[, i.e. each 
gk is a sampled version of a translated and dilated H. Meanwhile, go and gn+i 
are taken to be plateau functions centered at the zero and the Nyquist frequencies 
respectively. Thus, each filter gk is centered at = £,kL/^s and has support 

Lfc = rikL/s^s- 

It is easy to see that this choice of G{g, a) satisfies the conditions of Proposi- 
tion [2] for any sequence a with L/ak > Lk for all k e Ik = {0, . . . , 2K + 1}. Note 
that while ak might be rational, L/ak must be integer-valued. Consequently, per- 
fect reconstruction of the signal is obtained from the coefficients Cn,k by applying 
Algorithm [2] with a dual frame, e.g. the canonical dual given by ([9]). 
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Figure 2. Tukey windows used in the slicing process. Note that 
the chosen amount of zero-padding leads to a half-overlap situation. 



4. Real-time processing and the sliCQ 

The CQ-NSGT implementation introduced in the previous sections a priori relies 
on a Fourier transform of the entire signal. This contradicts the idea of real- 
time applications, which require bounded delay in processing incoming samples 
and linear over-all complexity. These requirements can be satisfied by applying the 
CQ-NSGT in a blockwise manner, i.e. to (fixed length) slices of the input signal. 
However, the slicing process involves two important challenges: First, the windows 
h„i used for cutting the signal must be smooth and zero-padding has to be applied to 
suppress time-aliasing and blocking artifacts when coefficient-modification occurs. 
Second, the coefficients issued from the block-wise transform should be equivalent 
to the CQ-coefficients obtained from a full-length CQ-NSGT. This can be achieved 
to high precision by careful choice of both the slicing windows hm and the analysis 
windows used in the CQ-NSGT. 



4.1. Structure of the sliCQ transform. We now summarize the individual steps 
of the sliCQ algorithm and introduce the involved parameters. 

I) Sliced constant-Q NSGT analysis: 

(1) Cut the signal / e into overlapping slices /,„ of length 2N by 
multiplication with uniform translates of a slicing window /iq, centered 
at 0. 

(2) For each/™, obtain coefficients c™ e C^^/'^'' ^l^^l , by applying CQ-NSGT2jv(/, g, a) 
(Algorithm p. 

(3) Due to the overlap of the slicing windows, cf. Figure^ each time index 
is related to two consecutive slices. For visualization and processing, 
the slice coefficients c™ are re-arranged into a 2-layer array s, with 
s := {s'},e{o,i} e C2><-^/'^'=><l^^l, cf. FigurelSl 

II) Sliced constant-Q NSGT synthesis: 

(1) Retrieve by partitioning s. 

(2) Compute the dual frame C/(g,a) for C/(g, a) and, for all m, /™ = 
iCQ-NSGT2w(c'", g, a) (Algorithm O . 

(3) Recover / by (windowed) overlap-add. 

Note that L must be a multiple of 2A^; this is achieved by zero-padding, if necessary. 
By construction, the positions (n, k) of the coefficients in s' reflect their time- 
frequency position with respect to the full-length signal, for / = 0, 1. 
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Figure 3. Structure of the sliCQ coefficients - schematic iUustration 

4.2. Computation of a sliced constant-Q NSGT. The sliced constant-Q NSGT 
(shCQ) coefficients of / with respect to ho and Q{g,a.) and sUce length 2N are ob- 
tained according to the following algorithm. 



Algorithm 3 sliCQ analysis: s = sliCQ^^ Ar(/: ho,g,SL) 

1: Initialize /, ho, gk for all k e Ik 

2: m <— 

3: for m = 0, . . . L/N - f do 
4: for j = 0, . . . 2iV - 1 do 
5: IT^^Nho[j + {m- 

6: end for 

7; c"^CQ-NSGT2^(/,g,a) 
8: I <^ (m mod 2) 
9: for k 6 Ik, = 0, . . . , 2N/ak 

in. 4 

l)N/ak.k ^n',k 

11: end for 

12: end for 



Note that in this and the following algorithm, negative indices are used in a circu- 
lar sense, with respect to the maximum admissible index, e.g. /[— j] := /[^^j] or 
s^-n k ■~ ^^L/ak-n k' ^^"^ CQ-NSGT aualysis before, Algorithm[3]is complemented 
by a synthesis algorithm with similar structure, Algorithm^l that synthesizes a sig- 
nal / from a 2-layer coefficient array s. 

The following proposition states that / is perfectly recovered from its sliCQ 
coefficients by applying Algorithm |4l see Appendix 18.21 for a proof. 

Proposition 3. Let Q{g, a) and Q{^, a) he dual NSG systems for . Further let 
ho, ho e satisfy 

L/N-l 

(10) 2 T^N (hoho) = 1. 

m=0 

If s is the output o/ sliCQ^ ^(/, /iq, g, a) (Algorithmic, then the output f of 
isliCQ^ ^(s,/io,g,a) (Algorithmic equals /, i.e., f = f. 



-1)N] 
1 do 
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Algorithm 4 sliCQ syntliesis: / = isliCQ^ ^(s,/io,g,a) 

1: Initialize s, ho, gu for all k e Ik 

2: m ^ 

3: f^OL 

4: for m = 0, . . . L/N - 1 do 

5: I ^ {m mod 2) 

6: for ke Ik, = 0, . . . , 2N/ak - 1 do 

'■ Si=,fc *jiS+(m-l)Af/afc,*: 

8: end for 

9: /™^iCQ-NSGT2^(c"\g,a) 

10: for J = 0, ... 2iV - 1 do 

11: f[j + (m - l)iV] ^ 

f[j + {m-l)N] + r[j]ho[j - N] 

12: end for 

13: end for 



4.3. The relation between CQ-NSGT and sliCQ. To maintain perfect recon- 
struction in the final overlap-add step in Algorithm |4l we assume 

L/Af~l 

(11) hm = TmNho with ^ hm = l, 

m=0 

and use a dual window ho satisfying (|10p in the synthesis process. 

Another obvious option for the design of the slicing windows is to require h"^ = 
1, which would allow for using the same windows in the final overlap-add step. 
However, if we want to approximate the true CQ-coefficients as obtained from a 
full-length transform, pT|) is the more favorable condition. 

In our implementation, slicing of the signal is accomplished by a uniform par- 
tition of unity constructed from a Tukey window ho with essential length N and 
transition areas of length M, for some N,M e N with M < N (usually M « N). 
The slicing windows are symmetrically zero-padded to length 2N , reducing time- 
aliasing significantly. The uniform partition condition leads to close approx- 
imation of the full-length CQ-NSGT by sliCQ. This correspondence between the 
sliCQ and the corresponding full-length CQ-NSGT is made explicit in the following 
proposition, proven in Appendix 18.21 

Proposition 4. Let Q{g^,a) be a nonstationary Gabor system for C^. Further, 
let ho e he such that pT|) holds and define gu e C^^, for all k e Ik by 

9k[j] = g^[jL/{2N)]. 

For f e C^, denote by c e C^/<^kx\iK\ ^/jg CQ-NSGT coefficients of f with respect 
to G{g^,a) and by s e c^xL/a^xliKl sliCQ coefficients of f with respect to ho 
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and G is, 3.) . Then 

< 11/112(11(1 -/lO-/il)T„=a.5l|2 

2N ^ 

(12) +Wio + hi) 2 T„ 

/or n = mN/ttk + , with to = 0, ... , L/iV — 1 and = 0, . . . , N/ak — 1. 

Remark 2. In practice, is chosen such that the translates Tnak9k '^'"^ essentially 
concentrated in 

, N-M N-M, 
In,m = [ ^ ,N+ ], 

j-e. ||T„Q^gfxR\/«,Mll2 « W^nakdkh, for all n = Q,...,N/ak - 1. Therefore, the 



value of (|12|) is negligibly small. While more precise estimates of the error are 
beyond the scope of the present contribution, numerical evaluation of the approxi- 
mation quality is given in Section \5.3[ 



As a consequence of the previous proposition, we define the sliCQ spectrogram 
as js" + s-^p and propose to simultaneously treat s° and j,, corresponding to 
the same time-frequency position, when processing the coefficients. 



5. Numerical Analysis and Simulations 

In this section we treat the computational complexity of CQ-NSGT and sliCQ 
and how they compare to one another. In |21) it was shown that despite super- 
linear complexity, CQ-NSGT outperforms state-of-the-art implementations of the 
classical constant-Q transform. Since sliCQ is a linear cost algorithm, it further 
improves the efficiency of the CQ-NSGT for sufficiently long signals. Section 15.31 
provides experimental results confirming the good approximation of CQ-NSGT by 
the corresponding sliCQ coefficients, cf. Proposition |4l 

The CQ-NSGT and sliCQ Toolbox (for MATLAB and Python) used in this con- 
tribution is available at http : //www . univie . ac . at/nonstatgab/slicq, alongside 
extended experimental results complementing those presented in Section [6l 

5.1. Computation Time and Computational Complexity. We assume the 
number of filters \Ik\ in the CQ-NSGT to be independent of the signal length L 
and Proposition [2] to hold, in particular L/uk > Lk- The support size Lk of each 
filter gk depends on L. Hence, the number of operations for Algorithm [T] is as 
follows: 

0(l log(i)+ V L/ok log {L/ak)+ Lk ). 
\- . ' ,±f ' . ' ' — » — 

FFTi, IFFTi/„^ f-W 

With Lk and L/ok bounded by L, this can be simplified to 0{L logL). 

The computation of the dual frame involves inversion of the multiplication op- 
erator S and applying the resulting operator to each filter. This results in 
^i'^HkeiK ~ ^(-^) operations, where the support of the gk was taken into 
account. 
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Figure 4. Computation time versus signal length of the CQ trans- 
form (dark gray) and CQ-NSGT. For the CQ-NSGT we show sep- 
arate graphs including (light gray), respectively neglecting prime 
signal lengths (black) . Graphs show the mean performance (solid) 
and variance (dashed) over 50 iterations. 

Complexity of Algorithm [2] can be derived to be 0{L logL), analogous to Algo- 
rithm [TJ 

For sliCQ^ ^ (Algorithm [3]) , we assume the slice length 2N to be independent 
of L, resulting in a computational complexity of 



Both the dual frame and ho can be precomputed independent of L, whilst Algorithm 
|4]is of complexity 0{L), analogous to Algorithm [3] 

5.2. Performance evaluation. A comparison of the CQ-NSGT algorithm with 
previous constant-Q implementations was given in |21j . Figure 2] reproduces and 
extends some of the results; it shows, for both the constant-Q implementation pro- 
vided in |18j and CQ-NSGT, mean computation duration and variance for analysis 
followed by reconstruction, against signal length. The plot also illustrates the de- 
pendence of CQ-NSGT on the prime factor decomposition of the signal length L. 

Figure [5] illustrates the performance of sliCQ compared to the constant-Q and 
CQ-NSGT algorithms shown in Figure |31 Linearity of the sliCQ algorithm be- 
comes obvious, deviations occurring due to unfavorable FFT lengths 2N/ak in 
(i)CQ-NSGT2jY. Performance improvements for increasing slice length can be 
attributed to the advanced nature of MATLAB's internal FFT algorithm, as com- 
pared to the current implementation of the sliCQ framework. 

The performance of the involved algorithms does not depend on signal content. 
Consequently, random signals were used in the performance experiments, although 
we implicitly assumed the signals to be sampled at 44.1 kHz. All the results repre- 
sent transforms with 48 bins per octave, minimum frequency 50 Hz and maximum 
frequency 22 kHz, in Section [5] a maximum frequency of 20 kHz is used instead. 
For a more comprehensive comparison of the CQ-NSGT to previous constant-Q 




^slices 



CQ-NSGT. 
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Figure 5. Computation time versus signal length of the CQ trans- 
form (dotted gray), CQ-NSGT (dashed gray) and various shCQ 
transforms. The sliCQ transforms were taken with slice lengths 
4096 (sohd gray), 16384 (dotted black), 32768 (dashed black) and 
65536 (solid black) samples. 

transforms, please refer to [21]. Results for other parameter values do not differ 
drastically and are omitted. 

All computation time experiments were run in MATLAB R2011a on a 3 Giga- 
hertz Intel Core 2 Duo machine with 2 Gigabytes of RAM running Kubuntu 10.04 
using the MATLAB toolboxes available at http : //www . elec . qmul . ac . uk/people/ anssik/cqt/ 
and http : //www . univie . ac . at/nonstatgab/I 

5.3. Approximation properties. To verify the approximate equivalence of the 
sliCQ coefficients to those of a full-length CQ-NSGT and thus to a constant-Q 
transform, we computed the norm difference between s° + and c as in Proposi- 
tion m for two sets of fundamentally different signals. Set 1 contains 50 random, 
complex-valued signals of 2^° samples length, while Set 2 consists of 90 music sam- 
ples of the same length, sampled at 44.1 kHz each, covering pop, rock, jazz and 
classical genres. The signals of the second set are well-structured and often well- 
concentrated in the time-frequency plane, characteristics that the first set lacks 
completely. ^ 

For discretization reasons as well as to achieve good concentration of in Propo- 
sition |4l sliCQ implementations must impose a lower bound on the length of gk- 
Approximation results for various lower bounds on the filter length are summarized 
in Figure ini showing the mean approximation quality over the whole set. 

All errors are given in signal-to-noise ratio, scaled in dB: 

Figure [S] shows that, independent of other parameters, a minimal filter length 
smaller than 8 samples leads to a representation that is visibly different from, while 
values above 16 samples yield coefficients that are largely equivalent to those of 
a constant-Q transform. We can see that the slice length itself has rather small 
influence on the results, while the interplay of slicing window shape, specified by 
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Figure 6. SliCQ coefficient approximation error against the min- 
imal admissible bandwidth for Set 1 (top) and Set 2 (bottom) . All 
transforms use Blackman-Harris windows in the CQ-NSGT step. 
Solid and dashed lines represent long (1/4 slice length) and short 
(1/128 slice length) transition areas respectively, while colors cor- 
respond to the slice length: 4096 (light gray), 16384 (dark gray) 
and 65536 samples (black). 

the ratio of transition area length to slice length, and minimal filter length is illus- 
trated nicely; remarkably, this ratio influences the approximation quality mainly 
for moderately well localized filters. This is in correspondence with the character- 
ization given in (|12p: the circular overspill, given by the second term of the right 
hand side in (1121) . depends on the shape and support of the sum of two adjacent 
slicing windows, in particular for moderately well localized filters. If the windows 
are very well localized, the overspill is small independent of the particular shape of 
the slicing area. On the other hand, very badly localized windows make the distinct 
influence of the slicing windows negligible. Finally, a comparison of the top and 
bottom graphs in Figure [5] shows that the approximation quality is largely inde- 
pendent of the signal class. For Set 1 the variance is generally negligible (< 0.1 dB) 
and was omitted. Despite some outliers in Set 2, we have found the approximation 
quality to depend on the minimal filter length in a stable way, cf. Figure [71 These 
outliers can be attributed to signals particularly sparse (smaller error) or dense 

(larger error) in low frequency regions, where is least concentrated. 

6. Experiments on Applications 

Experiments in [21] show how the CQ-NSGT can be applied in the processing 
of signals taking advantage of the logarithmic frequency scaling and the perfect 
reconstruction property. In particular, the transposition of a harmonic structure 
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Figure 7. Coefficient approximation error (|12l) for all signals from 
Set 2 and slice and transition length of 65536, resp. 16384 sam- 
ples. Line style indicates the minimal filter length: 8 (dotted), 16 
(dashed) and 32 (solid) samples. 
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Figure 8. Masks for extracting a transient (top) and sinusoidal 
component (bottom) of the Glockenspiel signal. The gray level 
plot describes the amplitude of the mask, with black and white 
representing 1 and 0, respectively. 

amounted to just a translation of the spectrum along frequency bins, while the 
masking of the CQ-NSGT coefficients allowed for the extraction or suppression of 
a component of the signal. In our experiment, we show that the two procedures 
can be used to modify a portion of a signal. 
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Figure 9. CQ-NSGT spectrograms showing an excerpt of the 
Glockenspiel signal before (top) and after transposition of a com- 
ponent (bottom). 



Figure [5] shows masks for isolating a transient part and the corresponding sinu- 
soidal part of a Glockenspiel signal, created using an ordinary image manipulation 
program. Therein, the layers paradigm has been used to be able to quickly switch 
on and off the masks in order to accurately adapt them to the GQ-NSGT repre- 
sentation of the audio. An "inverse mask" is also constructed for the remainder 
part of the signal, essentially decomposing the signal into transient, sinusoidal and 
background portions. The masks have been drawn in the logarithmic domain, to be 
able to handle the dynamics of the audio. They are linearly scaled in dB units, so 
that in the mask corresponds to 10~^ (—100 dB) and 1 corresponds to 1 (0 dB). 

While keeping the transient part, the isolated sinusoidal component of the signal 
is transposed upward by 2 semitones, corresponding to 8 frequency bins. The 
transient, the remainder, and the modified sinusoidal coefficients are then added 
and the inverse transform is applied to obtain the resulting processed signal. For 
ease of use, this process is done with a rectangular representation of the slices, 
obtained by choosing L/ak constant for all frequency bands which corresponds to 
a sinc-interpolation of the coefficients. 

Figure compares the CQ-NSGT spectrograms of the original and the modified 
signal, while Figure [TU] shows the results for the same experiment using sliCQ 
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transforms with different slice lengths. Note that the plots show the spectro- 
gram of the synthesized signal, not the time-frequency coefficients before synthe- 
sis. Further, the exact same mask was used for CQ-NSGT and sliCQ transposi- 
tions. The sound files for this and other transposition experiments are available at 



http://www.univie.ac.at/nonstatgab/slicq A script for the Python toolbox 



that executes the experiment, is available on the same page. 

For synthesis, performed from modified coefficients, as opposed to mere recon- 
struction, an evaluation of the results is a highly non-trivial matter. This is due to 
the lack of a properly defined notion of accuracy or the existence of a target signal, 
not only for the algorithms presented here, but for any analysis/synthesis based sig- 
nal processing framework. Thus, while the examples in this section should indicate 
that CQ-NSGT synthesis and sliCQ synthesis can produce results in accordance 
with intuition, an in-depth treatment of this subject is far beyond the scope of this 
article. 



7. Summary and Conclusion 

In this contribution, we have introduced a framework for real-time implementa- 
tion of an invertible constant-Q transform based on frame theory. The proposed 
framework allows for straight-forward generalization to other non-linear frequency 
scales, such as mel- or Bark scale, cp. While real-time processing is possible 
by means of a preprocessing step, we investigated the possible occurrence of time- 
aliasing. We provided a numerical evaluation of computation time and quality of 
approximation of the true NSGT coefficients. 

In analogy to the classical phase vocoder, phase issues have to be addressed, 
if CQ-transformed coefficients are processed, cp. [HJ [T31 [T7] . While preliminary 
experiments using the proposed framework for real-life signals were presented, un- 
desired phasing effects, mainly due to the contribution of a signal component to 
several adjacent filters, will be investigated in detail in future work. Furthermore, 
future work will consider the efficient realization of adaptivity in both time and 
frequency by varying the length of the preprocessing windows used for slicing. 



8. Appendix 
8.1. Derivation of CQ-NSGT properties. 

Proof of Proposition 1. By Algorithm [1] we have 

Cn,k = Ck[n] 

i/dfc-lafc-l ^ 

(13) = 2 Yi{fMna,W)[m + l-] 

m = 1=0 "'^ 

Since L/ok > L, only one element of the inner sum above is non-zero, for each 
TO e {0, ... , L/ai — 1}. It follows that 

(14) C„,fe = </,M_„a,5fe>- 
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Figure 10. sliCQ spectrograms showing an excerpt of the Glock- 
enspiel signal after transposition of a component. The top plot was 
done with a slice length of 50000 and a transition area of 20000 
samples, the bottom plot with a slice length of 5000 and a transi- 
tion area of 2000 samples. 



Inserting into Algorithm [2] yields, for all j e {0, . . . ,L — 1}, 

L/a^-l 

= 22 (f,M~na,gk)M^na,9k[]l 
kElK n=0 

the discrete frame synthesis formula. By assumption, Q{g,a) and Q{g,a.) are 
NSG frames and thus 

hj] = f[jl forallj6{0,...,i-l}. 
Applying the inverse discrete Fourier transform completes the proof. 
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Proof of Proposition 2. Denote by Jk an interval of length Lk, Lk as in Section [2j 
containing the support of gt- By assumption 

0< 2 k[j]|'<«), forallj=0,...,i-l 

kelK 

and L/ok 5= Lk = \Jk\- Note that the frame operator ^ can be written as follows 
S/m= E E </,M„„,5fc>M_„,gfe[j] 

kelK n=0 

-27rinjak/L 



= 2 \/- E IFFTi/„,(/gr)Mg;c[j]e- 

fee/^ ^ "■'^ n=0 

(15) = I] -FFTi/,,(IFFTi/,,(/g^))[j]5fc[j], 

k^lK 

for all / e C'^. Furthermore, with xJk the characteristic function of the interval Jk, 
fVk = XJk Xj '^'i/ifc (•/'ff^) 

(=0 

= XJ.FFTi/,,(IFFTi/,,(/5r)) 
and, obviously, gfc = XJk9k- Inserting into ()15p yields 

(16) =f[j] 2 

With the sum bounded above and below, the inverse frame operator can be written 
as 

(17) S-'f[j] = f[j] ( 2 -\9km] , for all / e 

Since the elements of the canonical dual frame are given by ([3]) , this completes the 
proof. □ 

8.2. Derivation of sliCQ properties. 

Proof of Proposition[3[ According to Proposition[Tl /™, the output of iCQ-NSGT 
in Step 9 of Algorithm H satisfies to = (/ • T^Nho)[j + (m - l)iV]. Since 

Tim TmAT (/iO^o) = 1 holds, 

/ = 2(/ • TmNhQ)TmNho = f ■ ^ T„ijv (jT-oh-Q^ = f 
m m 

follows. □ 
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Proof of Proposition j^] • Since gk is obtained by sampling with sampling period 
L/2N, the (inverse) Fourier transform g^. of g^ is given by periodization of g^ as 
follows: 

-k 1 

(18) 9k[l]= 2 g^[l + J-2N]. 

Recall from (jH) that the CQ-NSGT coefficients of / with respect to t/(g'^,a) are 
given by Cn,k = </, T„aj^g^>, while the CQ-NSGT coefhcients c"* of /™ are, for 
m = 0, . . . , L/N - 1, = 0, . . . , 1^ - 1 and fc 6 

/ 5^-1 ^\ 

(19) =\^f,hm 2 T^n-ak + im-l + 2])N9k ^ , 

where the final inner product is taken over C . Observe that every n = 0,...,-^-! 
can be written as n = m— + with from 0, — — 1 and thus 

„0 I 1 _ m , m+1 

■^n,k ^n,k ~ '^n^+N /ak.k ^ '^n=' ,k 

= ( f, {hm + h„i+i) ^ n = ak+(m+2j)N 9k / 
J, Tn-ak+mN9k)' + 

(20) =Cn,k + R[n]. 
Here, 

= (1 ~ - hm+l)Tn-ak+mN9k^ 

/ 577-1 

(21) + ( [K, + /i,„+i) ^n^ak+{m+2j)N9k /' 

Hence s° fe + ^ ^ Cn.k = R[n]. The result follows from Cauchy- Schwartz' inequal- 
ity, applied to the case m = 0, observing independence from m. □ 
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